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Grid technologies aim at enabling a coordinated resource-sharing and problem-solving capabilities 
over local and wide area networks and span locations, organizations, machine architectures and 
software boundaries. The heterogeneity of involved resources and the need for interoperability 
among different grid middlewares require the sharing of a common information model. Abstractions 
of different flavors of resources and services and conceptual schemas of domain specific entities 
require a collaboration effort in order to enable a coherent information services cooperation. 

With this paper, we present the result of our experience in grid resources and services modelling 
carried out within the Grid Laboratory Uniform Environment (GLUE) effort, a joint US and EU 
High Energy Physics projects collaboration towards grid interoperability. The first implementation- 
neutral agreement on services such as batch computing and storage manager, resources such as the 
hierarchy cluster, sub-cluster, host and the storage library are presented. Design guidelines and 
operational results are depicted together with open issues and future evolutions. 



I. INTRODUCTION 

Grid technologies aim at enabling a coordinated 
resource-sharing and problem-solving capabilities over 
local and wide area networks and span locations, orga- 
nizations, machine architectures and software bound- 
aries. The heterogeneity of involved resources and the 
need for interoperability among different grid middle- 
ware solutions require the sharing of a common in- 
formation model in order to enable both intra- and 
inter-grid resources awareness. 

The research area of computing in High Energy and 
Nuclear Physics (HENP) is populated by several Grid 
related projects that mostly rely on basic services 
provide d by the Globus Toolkit ||| and the Condor 
Project [23 • Due to the large adoption of the provided 
functionalities, interoperability issues are mostly re- 
lated to what is built on top of these components. 
For the purpose of enabling HENP Grid middlewares 
interoperability, the Grid Laboratory Uniform Envi- 
ronment (GLUE) collaboration a joint US and EU 
High Energy Physics projects effort, has been set up. 

One of the main achievements of this collabora- 
tion has been carried out in the context of the GLUE 
Schema activity. The main purpose was to define a 
common resource information model to be used as 
a base for Grid Information Service (GIS) for both 
resource discovery and monitoring activities. Start- 
ing from the Globus MDS schema [L2} and the EU 
DataGrid (EDG) schema, the first implementation- 
neutral agreement on services (such as batch com- 
puting and storage managers) and systems (such as 
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the hierarchy cluster, sub-cluster, host and the stor- 
age library) has been denned [ljj. The EU DataTAG 
(EDT) project jj| has contributed in the collection 
of requirements from several projects (mainly EDG) 
and has developed both the implementation-neutral 
description by means of Unified Modeling Language 
(UML) class diagram and schema implementation 
for the LDAP data model [l7| . 

In this paper, we recall the outcomes of the collab- 
oration to which we have participated, and we also 
suggest refinements and improvements based on our 
experience related to both analysis, implementation 
and deployment within EDG @, EDT and LCG @] 
testbeds. 

We present the involved entities categorized into 
two main categories: 

• System: a set of connected items or devices 
which operate together as a functional whole. 

• Service: actions that form a coherent whole from 
the point of view of service providers and service 
requesters. 

From the viewpoint of discovery and monitoring, 
the distinction between systems and services is funda- 
mental, since even though they are strongly related 
(systems provide services), they have different life- 
cycle and different status related attributes. The main 
focus was on the service level in order to enable an effi- 
cient service selection. Recently, within the DataTAG 
project, extensions for the host system have been done 
in order to improve monitoring capabilities. 

This paper is organized as follow: in section ITT1 we 
describe entities within the system category, while in 
section ITTT1 services are discussed. In both cases, de- 
fined concepts are recalled and feedback from our de- 
ployment experience is described. In section IIVI the 
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implementation results are presented, while in sec- 
tion [V] related works are mentioned and compared to 
the GLUE Schema. Finally, in section IVll conclusions 
and plans for future work are depicted. 



II. MODELLING SYSTEMS 

As mentioned in the introduction, a system is here 
defined as a set of connected items or devices which 
operate together as a functional whole. Within the 
GLUE Schema, two main systems categories have 
been defined: cluster systems providing comput- 
ing services, and storage systems providing storage 
spaces. 



A. Cluster Systems 

A cluster is essentially a container that groups to- 
gether computing nodes (hosts), such as a computing 
farm. Since in the context of computing in HENP, a 
cluster is composed by many nodes, in order to avoid 
poor performance in the resource discovery process, 
the concept of subcluster was introduced. A subclus- 
ter represents 'homogeneous' collection of computing 
nodes, where the homogeneity is defined by a collec- 
tion whose some node attributes (which can be freely 
selected among the ones defined for the single node 
entity) all have the same value. For example, a sub- 
cluster could represent a set of nodes with the same 
architecture, operating system, CPU model, etc. It 
must be stressed that the elements of a subcluster are 
homegeneous only with respect to the considered cho- 
sen attributes. Subclusters therefore provide a conve- 
nient way of representing collection of nodes, useful in 
the resource discovery process. The host (computing 
node) clement represents detailed information (related 
to both hardware and software) of a specific node. 

In summary, a cluster is a set of nodes, and nodes 
can be partitioned in disjointed sets called subclusters, 
for which a summary description is available. 

From our point of view, the cluster definition should 
be refined enforcing the property of being a system. 
This implies that some functionality is provided. In 
the cluster case, the provided functionality is the abil- 
ity of executing jobs. Therefore, all nodes managed 
by the same batch system, form a unique cluster. 



B. Storage Systems 

In a Grid environment, storage systems can vary 
in complexity from a single disk server to hierarchical 
massive storage systems. Within the first phase of the 
GLUE Schema activity, the main goal was mainly ori- 
ented towards the service component modelling (see 



section IIIIB|I . more than the system component. The 
modelled storage system is called Storage Library and 
represents the machine providing for the storage man- 
ager service. This entity presents the file system com- 
ponent offered to the service, an architecture compo- 
nent and a performance component. 

Our opinion is that this concept should be refined. 
As is for the cluster system, a storage system should 
allow the representation of all entity participating in 
the service. 



III. MODELLING SERVICES 

A service can be defined as an activity that performs 
some task. Within core grid services, the computing 
service and the storage manager service have been de- 
fined. Each modelled service has a unique identifier, a 
human-readable name, a set of policies, a set of access 
rights and a state. 

Referring to the access rights, one of the main de- 
sign guideline was to move from the current practice of 
user-grained access right to the virtual organization- 
grained one. This approach is beneficial for the ac- 
tivity of authorization management in a distributed 
environment such as the Grid. The idea is that 
virtual organizations set up agreements with service 
providers. When a service is requested by a user, 
both his membership and his capabilities are veri- 
fied using organization-based authorization services. 
Considering this approach, local resources can there- 
fore avoid to maintain and publish the list of autho- 
rized user identities. The two main proposals in this 
area are the Virtual Organization Membership Ser- 
vice (VOMS) [H developed in the context of the 
EDG-EDT collaboration, and the Community Autho- 
rization Service (CAS) [22j developed by the Globus 
Project. 

A. Computing Service 

As computing service, we identify a service able to 
provide computing power to an application with a cer- 
tain quality. Within the GLUE Schema, the modelled 
service is called Computing Element (CE) and it is 
a one-to-one mapping to an entry point into a batch 
queueing system. Essentially a Computing Element 
represents a queue of a local resource management 
system, such as PBS or LSF. Since it is a service, it 
presents policy, state and access rights attributes. 

The CE concept was already present in the EDG 
Schema. With the GLUE Schema the separation be- 
tween system and service related info was introduced. 

In order to be able to perform a proper service se- 
lection, during the matchmaking process some data 
related to the system providing the service is needed 
(e.g. hosting operating system, available software 
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packages). Moreover, the desired data should be pro- 
vided as an aggregate description of the system part 
that can participate in the service functionality (e.g. 
in a cluster, only a subset of nodes can be assigned 
to a computing service). Such flexibility is not well 
modelled at present. 

B. Storage Manager Service 

With Storage service we identify a service which 
task is the management of storage extents. With Stor- 
age Space, we identify a storage extent managed under 
a uniform set of policies and having the same access 
rights. Stored files can be accessed by means of Data 
Access Protocols (e.g. GridFTP, rfio). 

Currently, the Storage Service has been modelled as 
a generalization of: 

• trivial file system 

• Storage Resource Manager (SRM) [l^ 

• EDG Storage Element (SE) HJ 

Current practice shows that there is lack of some 
information. For instance, a Storage Space is lacking 
of ownership info and of a unique ID, possibly in the 
form of a URL 



IV. IMPLEMENTATIONS 

Currently, the implementation-neutral description 
of the GLUE Schema has been mapped into three dif- 
ferent data models: 

• LDAP data model H3 

• Relational data model [2l| 

• XML data model [3 

The LDAP implementation has been done within 
the DataTAG project and covers the full schema. 
The testing phase has been carried out within the 
DataTAG testbed based on the EU DataGrid mid- 
dleware (release 1.4.x with Globus MDS 2.x). Strong 
support has been given for the rewriting of both in- 
formation providers and EDG Broker interaction with 
the MDS. The schema implementation is going to be 
deployed as the base schema in the next release of 
EDG, LCG and VDT grid middlewares. This has been 
also contributed by INFN |l(J to the Globus Project 
under the signed Globus Contributor's License. 

The Relational implementation has been developed 
by DataGrid WP3 in the R-GMA |n|. 

The XML implementation has been developed by 
the Globus Project in Globus Toolkit 3.0 (at present, 
only cluster system and computing element). 



V. RELATED WORK 

The first resource information model introduced 
within the HENP Grid community was the one com- 
ing with the Globus MDS. This was mainly oriented 
at modelling computer systems and lacked in capabili- 
ties of modelling grid services. Conversely, the former 
DataGrid schema was mainly oriented at creating ab- 
stractions for grid services. While this presented new 
concepts useful for discovery and matchmaking pur- 
poses, it did not clearly separates systems from ser- 
vices. 

Another work to mention is the NorduGrid informa- 
tion model [l6|. The distinguish feature is the User 
entity, modelled in order to provide per-user informa- 
tion, such as available storage space and processors. 
This is a different approach than the one taken within 
the GLUE Schema, where general authorization rules 
are on a per-organization base. 

It is important to mention also the outcomes of the 
ongoing activities of the CIM Grid Schema Work- 
ing Group of the Global Grid Forum (GGF CGS- 
WG) 24]. The considered approach is not only tar- 
geted at discovery and monitoring purposes, but also 
at the more complex task of resource management. 
The chosen strategy is to extend the industry stan- 
dard Common Information Model (CIM) @. While 
this approach provides detailed resource description 
and relationship, it needs special framework in or- 
der to offer the management interface. For the pur- 
pose of distributed discovery, it needs to be interfaced 
with the Grid Information Service. Due to the wider 
spectrum of goals envisioned by CIM, the information 
model is more complex. 



VI. CONCLUSIONS AND FUTURE WORK 

In this paper we presented the results and the out- 
comes of the GLUE Schema activity. Besides present- 
ing the modelled entities, both at the system and at 
the service level, the current shortcomings and some 
ideas to address them have also been described. 

In the next future the focus will be on the the re- 
finement of both subcluster and storage library en- 
tities. We also envision the evolution of the current 
proposal in order to model a general ancestor service. 
Moreover, attention will be given to monitoring re- 
quirements. 



Acknowledgments 

The authors wish to thank all the persons that have 
participated to the GLUE Schema activity. 



MOAT002 



4 



Computing in High Energy and Nuclear Physics, La Jolla, California, 2^-28 March 2003 



[1] "The Grid Laboratory Uniform Environement 
(GLUE)" , http://www.hicb.org/glue/glue.htm 

[2] "Distributed Management Task Force", 
http:/ /www. dmtf.org/ 

[3] "Unifor m M odeling Language" , 
http:/ /www. omg.org/uml] 

[4] "The DataTAG Project", http://www.datatag.org 

[5] "The DataGrid Project", http://www.datagrid.org 

[6] "The LHC Comp uting GncT Project'' , 
http:/ /www.cern.ch/lcg 

[7] "The NorduGr id Projejct", 

http:/ /www. nordugrid.org 

[8] "Comm on I nformation Model" , 

http:/ /www. dmtf.org/standards/standard_cim.php 

[9] "The Globus Toolkit" , 

http:/ /www. globus.org/toolkit/ 
[10] "Istituto Nazionale di Fisica Nucleare", 

http:/ /www. infn.it 
[11] S. Fisher et Al." R-GMA: First results after deploy- 
ment", Proceedings of the International Conference 
for Computing in High Energy and Nuclear Physics 
(CHEP 2003). 

[12] "The MPS 2.x schema", 

http:/ /www. globus.org/mds/Schema. html 

[13] "GLUE Schema documents" , 

http:/ /www.cnaf. infn.it/~sergio/ 
datatag/glue/index.htm. 

[14] K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman. 
"Grid Information Services for Distributed Resource 
Sharing", Proceedings of the Tenth IEEE Interna- 
tional Symposium on High-Performance Distributed 



Computing (HPDC-10), IEEE Press, August 2001. 
[15] A. Shoshani, "Storage Resource Managers: Why 

They Are Important to Data Grid Architecture", 

White paper, 2001. 
[16] B.Konya, "The NorduGrid information system", 

16/09/2002. 

[17] S. Andreozzi, "GLUE Schema implementation for the 
LDAP model", Draft, 29/05/2003. 

[18] The Globus Project, "GLUE Schema implementation 
for the XML data model", Draft, 29/05/2003. 

[19] R. Alfleri, R. Cecchini, V, Ciaschini, L. dell'Agnello, 
A. Frohner, A. Gianoli, K. Lorentey and F. Spataro, 
"An Authorization System for Virtual Organization" , 
1st European Across Grids Conference, 13-14 Feb 
2003. 

[20] J.C. Gordon et AL, "Architecture and Design for 
Mass Storage Management", DataGrid-05-D5. 2-0141- 
3-4, 2002. 

[21] EU DataGrid WP3, "GLUE Schema imple- 
mentation for the relational data model", 
http:/ /hepunx. rl.ac.uk/edg/wp3/ documenta- 
tion/doc / schemas / . 

[22] L. Pearlman, V. Welch, I. Foster, C. Kesselman and 
S. Tuecke, "A Community Authorization Service for 
Group Collaboration", IEEE Workshop on Policies 
for Distributed Systems and Networks, 2002. 

[23] "The Condor Project", 

http:/ /www. cs.wisc.edu/condor/ 

[24] E. Stokes, L. Flon, "Job Submission Information 
Model", GGF Proposed Standard, 11 Jun 2003. 



MOAT002 



